Using dictionaries
What are dictionaries?
netXtract dictionaries are lists of words identified by a name. netXtract ships with several
such dictionaries, of two different types:
- one built-in special dictionary, called the "Ignored words" dictionary,
contains all the words that should be ignored while indexing a document (such as: the, an,
at, and, then, or, etc). This dictionary is "special" because it cannot be deleted and it is
always being used by netXtract as a list of words to ignore.
- all the other dictionaries that netXtract ships with are "custom". This means that any user
can create as many such dictionaries as he wants, or delete them. We simply created a few such
custom dictionaries for your convenience, and to serve as a template for new ones. You may delete
as many as you like, or even all of them.
How are dictionaries used?
The purpose of the "Ignored words" is to make netXtract skip common words that are unlikely to be useful
in determining relevance. The purpose of a "custom" dictionary is to allow you to
filter the words in a document. A custom dictionary should be created for a specific area of
expertise or interest. For instance, you may create a dictionary called "Sports" with related words,
such as "soccer", "football", "basketball", "score", etc.
When indexing a document, netXtract will be able to tell you whether the document you are looking
at has any significant relevancy for "Sports" based on the frequency and relationships of the words
from your "Sports" dictionary in the current Web document.
How do I use a specific dictionary?
If you want to see only the words from a given dictionary in the index, and no other words, simply
right-click in the netXtract bar and select "Use dictionary", then the dictionary you want to
use to filter the index. As a result, only words that are from the selected dictionary AND are
also present in the document will be displayed in the index.
When you want to see all the words in the index again, right-click in the netXtract bar and select
"Use dictionary", then "None".
Managing dictionaries
netXtract allows you to create new custom dictionaries, delete them, or edit any existing dictionary.
Editing a dictionary means renaming it and / or adding to or removing words from it.
To manage the netXtract dictionaries, right-click on the netXtract bar and select the "Dictionaries" menu.
A dialog box pops up, displaying all the dictionaries that netXtrat knows about. Notice that the first
one ("Ignored words") has the "built-in" type and is marked "in use". Remember that "built-in"
means that you cannot delete it, and it will be always used by netXtract when indexing.
- Create: click the "Add" button to create a new custom dictionary. In the new dialog box, enter the name
you'd like for the new dictionary. You may enter words in the dictionary, but you are not required to
(you may create empty dictionaries).
- Delete: select a custom dictionary and click the "Delete" button to delete the selected dictionary. You will
be prompted to confirm the operation, as it cannot be undone. Note that the built-in dictionary
cannot be deleted (the Delete button is not available).
- Edit: select a dictionary and click the "Edit" button to edit the selected dictionary. You may edit any
dictionary, including the built-in one. In the dialog box that pops up you may change the name of the
dictionary and / or add or remove words:
- to add words, simply type them in the box below the list of words, then click Add (for fast,
succesive word additions, press Enter after typing in the word). The same result is achieved
from the netXtract Index tab, by right-clicking on a word, then selecting "Add to" to add the
word to a selected dictionary, or by selecting "Ignore" to add the word to the "Ignored words" list.
- to delete words, select the word (or hold the Ctrl key and select multiple words with the
mouse, or use Shift and the arrow keys) then click the Delete button.
- Select a custom dictionary and click the "Use" button to use the selected dictionary as a filter
for the current index. For the same result, just pick a dictionary from the list and drop it
on the netXtract Index tab. Either way, these operations are the equivalent of right-clicking in
the netXtract bar and selecting "Use dictionary", then a custom dictionary name: the list of indexed
words will be filtered against the words in the selected dictionary.
- Click "Stop using dictionary" to stop netXtract from using any dictionary and display all the
indexed words (note that it will still use the built-in Ignored words dictionary).
Note that you do not need to close the Dictionaries window in order to go back to IE. You may
keep it open and it will stay on top of your browser, while you interact with netXtract and IE.
Contents